Video encoding and decoding methods and corresponding devices

ABSTRACT

The invention relates to the field of video compression and, more specifically, to a video encoding method applied to an input sequence of frames in which each frame is subdivided into blocks of arbitrary size. This method comprises, for at least a part of the blocks of the current frame, the steps of: generating on a block basis motion-compensated frames obtained from each current original frame and a previous reconstructed frame; generating the said motion-compensated frames residual signals; using a matching pursuit algorithm for decomposing each of the generated residual signals into coded dictionary functions called atoms, the other blocks of the current frame being processed by means of other coding techniques; coding said atoms and the motion vectors determined during the motion compensation step, for generating an output coded bitstream; said method being such that any atom acts only on one block B at a time, said block-restriction leading to the fact that the reconstruction of a residual signal f is obtained from a dictionary that is composed of basis functions g γnæB restricted to the block B corresponding to the indexing parameter γ n , according to the following 2D spatial domain operation: g γnæB (i,j)=g γn (i,j) if pixel (i,j)εB; g γnæB (i,j)=0 otherwise (i.e. (i,j)∉B).

FIELD OF THE INVENTION

The present invention generally relates to the field of videocompression and, for instance, more particularly to the video standardsof the MPEG family (MPEG-1, MPEG-2, MPEG-4) and to the video codingrecommendations of the ITU H26X family (H.261, H.263 and extensions).More specifically, the invention relates to a video encoding methodapplied to an input sequence of frames in which each frame is subdividedinto blocks of arbitrary size, said method comprising for at least apart of said blocks of the current frame the steps of:

-   -   generating on a block basis motion-compensated frames, each one        being obtained from each current original frame and a previous        reconstructed frame;    -   generating from said motion-compensated frames residual signals;    -   using a so-called matching pursuit (MP) algorithm for        decomposing each of said generated residual signals into coded        dictionary functions called atoms, the other blocks of the        current frame being processed by means of other coding        techniques;    -   coding said atoms and the motion vectors determined during the        motion compensation step, for generating an output coded        bitstream.

The invention also relates to a corresponding video decoding method andto the encoding and decoding devices for carrying out said encoding anddecoding methods.

BACKGROUND OF THE INVENTION

In the current video standards (up to the video coding MPEG-4 standardand H.264 recommendation), the video, described in terms of oneluminance channel and two chrominance ones, can be compressed thanks totwo coding modes applied to each channel: the “intra” mode, exploitingin a given channel the spatial redundancy of the pixels (pictureelements) within each image, and the “inter” mode, exploiting thetemporal redundancy between separate images (or frames). The inter mode,relying on a motion compensation operation, allows to describe an imagefrom one (or more) previously decoded image(s) by encoding the motion ofthe pixels from one (or more) image(s) to another one. Usually, thecurrent image to be coded is partitioned into independent blocks (forinstance, of size 8×8 or 16×16 pixels in MPEG-4, or of size 4×4, 4×8,8×4, 8×8, 8×16, 16×8 and 16×16 in H.264), each of them being assigned amotion vector (the three channels share such a motion description). Aprediction of said image can then be constructed by displacing pixelblocks from a reference image according to the set of motion vectorsassociated to each block. Finally, the difference, or residual signal,between the current image to be encoded and its motion-compensatedprediction can be encoded in the intra mode (with 8×8 discrete cosinetransforms—or DCTs—for MPEG-4, or 4×4 DCTs for H.264 in the main levelprofile).

The DCT is probably the most widely used transform, because it offers agood compression efficiency in a wide variety of coding situations,especially at medium and high bitrates. However, at low bitrates, thehybrid motion compensated DCT structure may be not able to deliver anartefact-free sequence for two reasons. First, the structure of themotion-compensated inter prediction grid becomes visible, with blockingartifacts. Moreover, the block edges of the DCT basis functions becomevisible in the image grid, because too few coefficients arequantized—and too coarsely—to make up for these blocking artifacts andto reconstruct smooth objects in the image.

The document “Very low bit-rate video coding based on matchingpursuits”, R. Neff and A. Zakhor, IEEE Transactions on Circuits andSystems for Video Technology, vol. 7, no. 1, February 1997, pp. 158-171,describes a new motion-compensated system including a video compressionalgorithm based on the so-called matching pursuit (MP) algorithm, atechnique developed about ten years ago (see the document “Matchingpursuits with time-frequency dictionaries”, S. G. Mallat and Z. Zhang,IEEE Transactions on Signal Processing, vol. 41, no. 12, December 1993,pp. 3397-3414). Said technique provides a way to iteratively decomposeany function or signal (for example, image, video, . . . ) into a linearexpansion of waveforms belonging to a redundant dictionary of basisfunctions, well localized both in time and frequency and called atoms. Ageneral family of time-frequency atoms can be created by scaling,translating and modulating a single function g(t)εL²(R) supposed to bereal and continuously differentiable. These dictionary functions may bedesignated by:g_(γ)(t)εG(G=dictionary set),  (1)γ(=gamma) being an indexing parameter associated to each particulardictionary element (or atom). As described in the first cited document,assuming that the functions g_(γ)(t) have unit norm, i.e. <g_(γ)(t),g_(γ)(t)>=1, the decomposition of a one-dimensional time signal f(t)begins by choosing γ to maximize the absolute value of the followinginner product:p=<f(t), g_(γ)(t)>,  (2)where p is called an expansion coefficient for the signal f(t) onto thedictionary function g_(γ)(t). A residual signal R is then computed:R(t)=f(t)−p.g _(γ)(t)  (3)and this residual signal is expanded in the same way as the originalsignal f(t). An atom is, in fact, the name given to each pair γ_(k),p_(k), where k is the rank of the iteration in the matching pursuitprocedure. After a total of M stages of this iterative procedure (whereeach stage n yields a dictionary structure specified by γ_(n), anexpansion coefficient p_(n) and a residual R_(n) which is passed on tothe next stage), the original signal f(t) can be approximated by asignal {circumflex over (f)}(t) which is a linear combination of thedictionary elements thus obtained. The iterative procedure is stoppedwhen a predefined condition is met, for example either a set number ofexpansion coefficients is generated or some energy threshold for theresidual is reached.

In the first document mentioned above, describing a system based on saidMP algorithm and which performs better than the DCT ones at lowbitrates, original images are first motion-compensated, using a toolcalled overlapped block-motion compensation which avoids or reducesblocking artifacts by blending the boundaries of predicted/displacedblocks (the edges of the blocks are therefore smoothed and the blockgrid is less visible). After the motion prediction image is formed, itis subtracted from the original one, in order to produce the motionresidual. Said residual is then coded, using the MP algorithm extendedto the discrete two-dimensional (2D) domain, with a proper choice of abasis dictionary (said dictionary consists of an overcomplete collectionof 2D separable Gabor functions g, shown in FIG. 1).

A residual signal f is then reconstructed by means of a linearcombination of M dictionary elements:

$\begin{matrix}{\hat{f} = {\sum\limits_{n = 1}^{n = M}{{\hat{p}}_{n} \cdot g_{\gamma_{n}}}}} & (4)\end{matrix}$If the dictionary basis functions have unit norm, {circumflex over(p)}_(n) is the quantized inner product <, > between the basis functiong_(□n) and the residual updated iteratively, that is to say:

$\begin{matrix}{{p_{n} = {< {f - {\sum\limits_{k = 1}^{k = {n - 1}}{{\hat{p}}_{k} \cdot g_{\gamma_{k}}}}}}},{g_{\gamma_{n}} >}} & (5)\end{matrix}$the pairs ({circumflex over (p)}_(n), γ_(n)) being the atoms. In thework described by the authors of the document, no restriction is placedon the possible location of an atom in an image (see FIG. 2). The 2DGabor functions forming the dictionary set are defined in terms of aprototype Gaussian window:w(t)={square root over (2.)}e ^(−nt) ²   (6)A mono-dimensional (1D) discrete Gabor function is defined as a scaled,modulated Gaussian window:

$\begin{matrix}{{{g_{\overset{\rightarrow}{\alpha}}(i)} = {K_{\overset{\rightarrow}{\alpha}}.{w\left( \frac{i - \frac{N}{2} + 1}{s} \right)}.{\cos\left( {\frac{2{{\pi\xi}\left( {i - \frac{N}{2} + 1} \right)}}{N} + \phi} \right)}}}{{with}\text{:}\mspace{14mu} i\;\varepsilon{\left\{ {0,1,\ldots\mspace{11mu},{N - 1}} \right\}.}}} & (7)\end{matrix}$The constant K_({right arrow over (α)}) is chosen so thatg_({right arrow over (α)})(i) is of unit norm, and {right arrow over(α)}=(s, ξ, φ) is a triple consisting, respectively, of a positivescale, a modulation frequency, and a phase shift. If S is the set of allsuch triples {right arrow over (α)}, then the 2D separable Gaborfunctions of the dictionary have the following form:G _({right arrow over (α)},{right arrow over (β)})(i,j)=g_({right arrow over (α)})(i)g _({right arrow over (β)})(j) for i,jε{0,1,. . . ,N−1}, and {right arrow over (α)},{right arrow over (β)}εS  (8)The set of available dictionary triples and associate sizes (in pixels)indicated in the document as forming the 1D basis set (or dictionary) isshown in the following table 1:

TABLE 1 size k s_(k) ζ_(k) φ_(k) (pixels) 0 1.0 0.0 0 1 1 3.0 0.0 0 5 25.0 0.0 0 9 3 7.0 0.0 0 11 4 9.0 0.0 0 15 5 12.0 0.0 0 21 6 14.0 0.0 023 7 17.0 0.0 0 29 8 20.0 0.0 0 35 9 1.4 1.0 π/2 3 10 5.0 1.0 π/2 9 1112.0 1.0 π/2 21 12 16.0 1.0 π/2 27 13 20.0 1.0 π/2 35 14 4.0 2.0 0 7 154.0 3.0 0 7 16 8.0 3.0 0 13 17 4.0 4.0 0 7 18 4.0 2.0 π/4 7 19 4.0 4.0π/4 7To obtain this parameter set, a training set of motion residual imageswas decomposed using a dictionary derived from a much larger set ofparameter triples. The dictionary elements which were most often matchedto the training images were retained in the reduced set. The obtaineddictionary was specifically designed so that atoms can freely match thestructure of motion residual image when their influence is not confinedto the boundaries of the block they lie in (see FIG. 2, showing theexample of an atom placed in a block-divided image withoutblock-restrictions).

However, the approach described in the cited document suffers fromseveral limitations. The first one is related to the continuousstructure of the Gabor dictionary. Because atoms can be placed at allpixel locations without any restriction and therefore span severalmotion-compensated blocks, the MP algorithm cannot represent blockingartefacts in the residual signal with a limited number of smooth atoms.It is the reason why it is necessary to have some kind of overlappedmotion estimation, in order to limit the blocking artifacts. If aclassical block-based motion compensation (i.e. without overlappingwindows) is used, the smooth basis functions may not be appropriate tomake up for blocking artifacts (indeed, it has been recently showed thatcoding gains could be made when the size of the residual codingtransform is matched to the size of the motion-compensated block).Third, it is difficult to combine intra and inter blocks in a codedframe (in the cited document, no DCT intra macroblock exists, probablyin order to avoid discontinuities on the boundaries of blocks coded inintra and inter mode that would be badly modelled by the smoothstructure of Gabor basis functions).

SUMMARY OF THE INVENTION

It is therefore an object of the invention to propose a video encodingmethod in which these limitations no longer exist.

To this end, the invention relates to a video encoding method such asdefined in the introductory part of the description and which ismoreover such that, when using said MP algorithm, any atom acts only onone block B at a time, said block-restriction leading to the fact thatthe reconstruction of a residual signal f is obtained from a dictionarythat is composed of basis functions g_(γ) _(n) |_(B) restricted to theblock B corresponding to the indexing parameter γ_(n), according to thefollowing 2D spatial domain operation:g _(γ) _(n) |_(B)(i,j)=g _(γ) _(n) (i,j) if pixel (i,j)εBg _(γ) _(n) |_(B)(i,j)=0 otherwise (i.e. (i,j)∉B).

The main interest of this approach resides in the fact that the MP atomsare restricted to the motion-compensated blocks. It allows to bettermodel the blocky structure of residual signals, implicitly augments thedictionary diversity for the same coding cost and offers the possibilityof alternating MP and DCT transforms since there is no interferenceacross block boundaries. It also avoids the need to resort to overlappedmotion compensation to limit blocking artefacts.

It is another object of the invention to propose a video encoding deviceallowing to carry out said encoding method.

It is still an object of the invention to propose video decoding methodand device allowing to decode signals coded by means of said videoencoding method and device.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be described, by way of example, withreference to the accompanying drawing in which:

FIG. 1 allows a visualization of the 400 basis functions of the 2D Gabordictionary used in the implementation of the matching pursuit algorithm;

FIG. 2 illustrates the example of an atom placed in a block-dividedimage without block-restrictions;

FIG. 3 illustrates an example of hybrid video coder according to theinvention;

FIG. 4 shows an example of a video encoding device for implementing a MPpursuit algorithm;

FIG. 5 illustrates the case of a block-restricted matching pursuitresidual coding, with an atom being confined into the motion-compensatedgrid and acting only on a block at a time;

FIG. 6 illustrates an example of hybrid video decoder according to theinvention;

FIG. 7 shows an example of a video decoding device implementing the MPalgorithm.

DETAILED DESCRIPTION OF THE INVENTION

A simplified block diagram of a video encoding device implementing ahybrid video coder using multiple coding engines is shown in FIG. 3.Several coding engines implement predetermined coding techniques, forinstance a coding engine 31 can implement the INTRA-DCT coding method, asecond one 32 the INTER-DCT coding method, and a third one 33 thematching pursuit algorithm. Each frame of the input video sequence isreceived (“video signal”) by a block partitioner device 34, whichpartitions the image into individual blocks of varying size, and decideswhich coding engine will process the current original block. Thedecisions representing the block position, its size and the selectedcoding engine is then inserted into the bitstream by a coding device 35.The current original signal block is then transferred to the selectedcoding engine (the engine 33 in the situation illustrated in FIG. 3).

A matching pursuit coding engine will be further illustrated as asimplified block diagram in FIG. 4 showing an example of a videoencoding device for implementing an MP pursuit algorithm. Each of theoriginal signal blocks of the input video sequence assigned to thecoding engine 33 is received on one side by motion compensating means 41for determining motion vectors (said motion vectors are conventionallyfound using the block matching algorithm), and the vectors thus obtainedare coded by motion vector coding means 42, the coded vectors beingdelivered to a multiplexer 43 (referenced, but not shown). On the otherside, a subtracter 44 delivers on its output the residual signal betweenthe current image and its prediction. Said residual signal is thendecomposed into atoms (the dictionary of atoms is referenced 47) and theatom parameters thus determined (module 45) are coded (module 46). Thecoded motion vectors and atom parameters then form a bitstream that issent to match a predefined condition for each frame of the sequence.

This encoding engine 33 carries out a method of coding an inputbitstream that comprises the following steps. First, as in most codingstructures, the original frames of the input sequence aremotion-compensated (each one is motion-compensated on the basis of theprevious reconstructed frame, and the motion vectors determined duringsaid motion-compensated step are stored in view of their latertransmission). Residual signals are then generated by difference betweenthe current frame and the associated motion-compensated prediction. Eachof said residual signals is then compared with a dictionary of functionsconsisting of a collection of 2D separable Gabor functions, in order togenerate a dictionary structure g_(γ)(t) specified by the indexingparameter γ_(n), an expansion coefficient p(n) and a residualR_(n)(t)−p.g_(γ)(t) which is passed on to the next stage of thisiterative procedure. Once the atom parameters are found, they can becoded (together with the motion vectors previously determined), thecoded signals thus obtained forming the bitstream sent to the decoder.

The technical solution proposed according to the invention consists inconfining the influence of atoms to the boundaries of the block they liein. This block-restriction means that an atom acts only on one block ata time, confined into the motion-compensation grid, as illustrated inFIG. 5. This block-restriction modifies the signal matching pursuitalgorithm in the following manner.

If one assume that it is wanted to obtain the MP decomposition of the 2Dresidual in a block B of size M×N pixels after motion-compensation, andif one denotes G|_(B) the MP dictionary restricted to B, the elementsg_(γ) _(n) |_(B) of said dictionary are obtained by means of therelationships (9) and (10):g _(γ) _(n) |_(B)(i,j)=g _(γ) _(n) (i,j) if pixel (i,j)εB  (9)g _(γ) _(n) |_(B)(i,j)=0 otherwise (i.e. (i,j)∉B)  (10)In this case, since g_(γ) _(n) |_(B) does not necessarily have a unitnorm, p_(n) needs to be reweighted as:

$p_{n} = \frac{\left\langle {f - {\sum\limits_{k = 1}^{k = {n - 1}}{{{\hat{p}}_{k} \cdot g_{\gamma_{k}}}{_{B}{,g_{\gamma_{n}}}}_{B}}}} \right\rangle}{\sqrt{\left\langle {g_{\gamma_{n}}{_{B}{,g_{\gamma_{n}}}}_{B}} \right\rangle}}$The interest of this approach resides in the fact that because a singleatom cannot span several blocks, it does not have to deal with thehigh-frequency discontinuities at block edges. Instead, it can beadapted to block boundaries, and even to block sizes, by designingblock-size dependent dictionaries. Moreover, since overlapped motioncompensation is no longer mandatory to preserve the MP efficiency,classical motion compensation may be used.

The preferred embodiment of encoding device described above sends abitstream which is received by a corresponding decoding device. Asimplified block diagram of a video decoding device according to theinvention and implementing a hybrid video decoder using multipledecoding engines is shown in FIG. 6. The transmitted bitstream isreceived on one side by a block partition decoding device 64, whichdecodes the current block position, its size, and the decoding method.Given the decoding method, the bitstream elements are then transferredto the corresponding decoding engine, 61 or 62 or 63 in the case of FIG.6, which will in turn decode the assigned blocks and output the videosignal reconstructed block. The available decoding engines can be forinstance an INTRA-DCT block decoder 61, an INTER-DCT block decoder 62,and a matching pursuit block decoder 63.

An example of matching pursuit decoding engine is further illustrated inFIG. 7 showing an example of a video decoding device implementing the MPalgorithm. The bitstream elements are received by an entropy decoderdevice 71, which forwards the decoded atom parameters to an atom device72 (the dictionary of atoms is referenced 73) which reconstructs thematching pursuit functions at the decoded position within the assignedvideo block to form the decoded residual signal. The entropy decoderdevice also output motion vectors which are fed into a motioncompensation device 74 to form a motion prediction signal frompreviously reconstructed video signals. The motion prediction and thereconstructed residual signal are then summed in an adder 75 to producea video signal reconstructed block.

1. A video encoding method applied to an input sequence of frames inwhich each frame is subdivided into blocks of arbitrary size, saidmethod comprising for at least a part of said blocks of the currentframe the steps of: generating on a block basis motion-compensatedframes, each one being obtained from each current original frame and aprevious reconstructed frame; generating from said motion-compensatedframes residual signals; using a so-called matching pursuit (MP)algorithm for decomposing each of said generated residual signals intocoded dictionary functions called atoms, the other blocks of the currentframe being processed by means of other coding techniques; coding saidatoms and the motion vectors determined during the motion compensationstep, for generating an output coded bitstream; said method being suchthat, when using said MP algorithm, any atom acts only on one block B ata time, said block-restriction leading to the fact that thereconstruction of a residual signal f is obtained from a dictionary thatis composed of basis functions g_(γ) _(n) |_(B) restricted to the blockB corresponding to the indexing parameter γ_(n), according to thefollowing 2D spatial domain operation:g _(γ) _(n) |_(B)(i,j)=g _(γ) _(n) (i,j) if pixel (i,j)εBg _(γ) _(n) |_(B)(i,j)=0 otherwise (i.e. (i,j)∉B).
 2. A video encodingdevice applied to an input sequence of frames in which each frame issubdivided into blocks of arbitrary size, said device being applied toat least a part of said blocks of the current frame and comprising:means for generating on a block basis, by means of a motion compensationstep, motion-compensated frames, each one being obtained from eachcurrent original frame and a previous reconstructed frame; means forgenerating from said motion-compensated frames residual signals; meansfor performing a so-called matching pursuit (MP) algorithm fordecomposing each of said generated residual signals into codeddictionary functions called atoms, the other blocks of the current framebeing processed by means of other coding techniques; means for coding,for each concerned block, said atoms and the motion vectors determinedduring the motion compensation step, for generating an output codedbitstream; said device being such that, when using said MP algorithm,any atom acts only on one block B at a time, said block-restrictionleading to the fact that the reconstruction of a residual signal f isobtained from a dictionary that is composed of basis functions g_(γ)_(n) |_(B) restricted to the block B corresponding to the indexingparameter γ_(n), according to the following 2D spatial domain operation:g _(γ) _(n) |_(B)(i,j)=g _(γ) _(n) (i,j) if pixel (i,j)εBg _(γ) _(n) |_(B)(i,j)=0 otherwise (i.e. (i,j)∉B).
 3. A video encodingdevice according to claim 2, characterized in that the quantized innerproduct p_(n) of a dictionary element is reweighted as:$p_{n} = {\frac{\left\langle {f - {\sum\limits_{k = 1}^{k = {n - 1}}{{{\hat{p}}_{k} \cdot g_{\gamma_{k}}}{_{B}{,g_{\gamma_{n}}}}_{B}}}} \right\rangle}{\sqrt{\left\langle {g_{\gamma_{n}}{_{B}{,g_{\gamma_{n}}}}_{B}} \right\rangle}}.}$4. A video decoding method applied to a bitstream coded by means of avideo coding method according to claim 1, said decoding method,comprising, for the concerned blocks, the steps of: decoding the codedatom parameters and motion vectors contained in said code bitstream;reconstructing from said decoded atom parameters the residual signals;generating motion compensated signals from said decoded motion vectors;generating video signal reconstructed blocks by summation of saidresidual signals and said motion compensated signals.
 5. A videodecoding device applied to a bitstream coded by means of a videoencoding device according to claim 2, said decoding device being appliedto the concerned blocks and comprising: means for decoding the codedatom parameters and motion vectors contained in said coded bitstream;means for reconstructing from said decoded atom parameters the residualsignals; means for generating motion compensated signals from saiddecoded motion vectors; means for generating video signal reconstructedblocks by summation of said residual signals and said motion compensatedsignals.